1
The Universality of Linearity
MATH005 Lesson 7
00:00

The Universality of Linearity is perhaps the most powerful shortcut in probability theory. It allows us to calculate the expectation of a sum of random variables by simply summing their individual expectations—regardless of whether those variables are independent, correlated, or mutually exclusive.

1. Foundations & Proposition 2.1

To understand why expectation behaves so linearly, we look at the Law of the Unconscious Statistician (LOTUS) for multivariate systems. Proposition 2.1 states that if $X$ and $Y$ have a joint probability mass function $p(x, y)$, then the expectation of any function $g(X, Y)$ is:

$$E[g(X, Y)] = \sum_{y} \sum_{x} g(x, y) p(x, y)$$

For continuous variables with joint PDF $f(x, y)$, the equivalent integral form is:

$$E[g(X, Y)] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x, y) f(x, y) dx dy$$

2. The Linearity Principle

By applying LOTUS to the function $g(X, Y) = X + Y$, we derive the central theorem of this lesson: $E[X + Y] = E[X] + E[Y]$. This extends naturally to any finite collection:

$E\left[\sum_{i=1}^n X_i\right] = \sum_{i=1}^n E[X_i]$

This is "universal" because it requires no assumptions about the joint distribution. Whether variables are independent or heavily dependent, the average of the sum is the sum of the averages.

Example 2a: The Ambulance Problem

Consider an accident at location $X$ on a road of length $L$ and an ambulance at $Y$, where $X, Y \sim U(0, L)$ and are independent. Using the multivariate LOTUS to find $E[|X-Y|]$:

The joint PDF is $f(x, y) = 1/L^2$ for $0 \le x, y \le L$.

$$E[|X-Y|] = \int_0^L \int_0^L |x-y| \frac{1}{L^2} dx dy = \frac{L}{3}$$

3. Monotonicity and Bounds

Expectation preserves the order of random variables. If $X \ge Y$ for all outcomes, then $E[X] \ge E[Y]$. This follows from Example 2b: if $X - Y \ge 0$, then $E[X - Y] \ge 0$. Furthermore, if a variable is bounded such that $P\{a \le X \le b\} = 1$, then it follows that $a \le E[X] \le b$.

4. The Sample Mean (Example 2c)

Let $X_1, \dots, X_n$ be a sample from a distribution with mean $\mu$. The sample mean is defined as:

$$\bar{X} = \sum_{i=1}^{n} \frac{X_i}{n}$$

Due to linearity, $E[\bar{X}] = \frac{1}{n} \sum E[X_i] = \frac{n\mu}{n} = \mu$. The expected value of the sample mean is $\mu$, proving it is an unbiased estimator.

⚠️ The Infinite Caveat
When one is dealing with an infinite collection of random variables $X_i, i \ge 1$, it is not necessarily true that $E[\sum_{i=1}^\infty X_i] = \sum_{i=1}^\infty E[X_i]$. The interchange is valid only if:
  1. The $X_i$ are all nonnegative random variables.
  2. The series is absolutely convergent: $\sum_{i=1}^\infty E[|X_i|] < \infty$.